(Private) Kernelized Bandits with Distributed Biased Feedback
نویسندگان
چکیده
In this paper, we study kernelized bandits with distributed biased feedback. This problem is motivated by several real-world applications (such as dynamic pricing, cellular network configuration, and policy making), where users from a large population contribute to the reward of action chosen central entity, but it difficult collect feedback all users. Instead, only (due user heterogeneity) subset may be available. addition such partial feedback, are also faced two practical challenges due communication cost computation complexity. To tackle these challenges, carefully design new phase-then-batch-based elimination (DPBE) algorithm, which samples in phases for collecting reduce bias employs maximum variance reduction select actions batches within each phase. By properly choosing phase length, batch size, confidence width used eliminating suboptimal actions, show that DPBE achieves sublinear regret ~O(T1-α/2 +√γT T), α ∈ (0,1) user-sampling parameter one can tune. Moreover, significantly both complexity bandits, compared some variants state-of-the-art algorithms (originally developed standard bandits). Furthermore, incorporating various differential privacy models (including central, local, shuffle models), generalize provide guarantees participating learning process. Finally, conduct extensive simulations validate our theoretical results evaluate empirical performance.
منابع مشابه
On Kernelized Multi-armed Bandits
We consider the stochastic bandit problem with a continuous set of arms, with the expected reward function over the arms assumed to be fixed but unknown. We provide two new Gaussian process-based algorithms for continuous bandit optimization – Improved GP-UCB (IGP-UCB) and GP-Thomson sampling (GP-TS), and derive corresponding regret bounds. Specifically, the bounds hold when the expected reward...
متن کاملBandits with Delayed, Aggregated Anonymous Feedback
We study a variant of the stochastic K-armed bandit problem, which we call “bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The...
متن کاملOnline Learning with Feedback Graphs: Beyond Bandits
We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multiarmed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T -roun...
متن کاملThreshold Bandits, With and Without Censored Feedback
We consider the Threshold Bandit setting, a variant of the classical multi-armed bandit problem in which the reward on each round depends on a piece of side information known as a threshold value. The learner selects one of K actions (arms), this action generates a random sample from a fixed distribution, and the action then receives a unit payoff in the event that this sample exceeds the thres...
متن کاملCombinatorial Multi-Armed Bandits with Filtered Feedback
Motivated by problems in search and detection we present a solution to a Combinatorial Multi-Armed Bandit (CMAB) problem with both heavy-tailed reward distributions and a new class of feedback, filtered semibandit feedback. In a CMAB problem an agent pulls a combination of arms from a set {1, ..., k} in each round, generating random outcomes from probability distributions associated with these ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ACM on measurement and analysis of computing systems
سال: 2023
ISSN: ['2476-1249']
DOI: https://doi.org/10.1145/3579318